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Protein structures can be studied as complex networks of interacting amino acids. We study pro- 
teins of different structural classes from the network perspective. Our results indicate that proteins, 
regardless of their structural class, show small-world network property. Various network parameters 
offer insight into the structural organisation of proteins and provide indications of modularity in 
protein networks. 
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I. INTRODUCTION 

Biological systems have been studied as networks at 
different levels: protein-protein interaction network 0, 
metabolic pathways network 0, Q , gene regulatory net- 
work jJi] , and protein as a network of amino acids 01 IE S 
1^. Proteins are biological macromolecules made up of a 
linear chain of amino acids and are organised into three- 
dimensional structure comprising of different secondary 
structural elements. They perform diverse biochemical 
functions and also provide structural basis in living cells. 
It is important to understand how proteins consistently 
fold into their native-state structures and the relevance of 
structure to their function. Network analysis of protein 
structures is one such attempt to understand possible 
relevance of various network parameters. 

There have been several efforts to study proteins as 
networks. Aszodi and Taylor compared the linear 
chain of amino acids in a protein and its three dimen- 
sional structure with the help of two topological indices - 
connectedness and effective chain length - related to path 
length and degree of foldedness of the chain. Kannan and 
Vishveshwara Q have used the graph spectral method to 
detect side-chain clusters in three-dimensional structures 
of proteins. In recent years, with the elaboration of net- 
work properties in a variety of real networks, Vendruscolo 
et al. showed that protein structures have small-world 
topology. They also studied transition state ensemble 
(TSE) structures to identify the key residues that play a 
key role of "hubs" in the network of interactions to sta- 
bilise the structure of the transition state. Greene and 
Higman studied the short-range and long-range inter- 
action networks in protein structures and showed that 
long-range interaction network is not small world and its 
degree distribution, while having an underlying scale-free 
behaviour, is dominated by an exponential term indica- 
tive of a single-scale system. Atilgan et al. |^ studied 
the network properties of the core and surface of globu- 
lar protein structures, and established that, regardless of 
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size, the cores have the same local packing arrangements. 
They also explained, with an example of binding of two 
proteins, how the small-world topology could be useful 
in efficient and effective dissipation of energy, generated 
upon binding. 

In this study, we model the native-state protein struc- 
ture as a network made of its constituent amino-acids 
and their interactions. The Cq atom of the amino acid 
has been used as a node and two such nodes are said 
to be linked if they are less than or equal to a thresh- 
old distance apart from each other jE, la| • We use 7A as 
the threshold distance. Our results show that proteins 
are small-world networks regardless of their structural 
classification across four major groups as enumerated in 
Structural Classification of Proteins (SCOP) _10]. We 
also highlight the differences in some of the network prop- 
erties among these classes. Our studies are indicative of 
the modular nature of these networks. 



II. METHODOLOGY 

The four structural classes (from SCOP) of proteins 
chosen are: a, P, a + [3, and a/ [3. The a proteins are 
composed predominantly of a helices, and the (3 proteins 
of (3 sheets. The a + (3 proteins mainly have anti-parallel 
(3 sheets, whereas those in a/ (3 consist of mainly parallel 
beta sheets. We consider 20 proteins from each of these 
classes whose sizes range from 73 to 2359 amino acids. 
The structural data is obtained from the Protein Data 
Bank (PDB) [ij. 

The parameters used to characterise the network are : 

(i) The Degree (ki) of a node i is the number of nodes to 
which it is directly connected. Average degree, K, of a 
network with TV nodes is defined as 

1 ^ 

i=l 

(ii) The Average Shortest Path Length is defined as 

^ N-l N 

NiN-1)^ ^ 

^ ' i=i j=i+i 
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FIG. 1: (a) The L-C plot of proteins from four structural classes, (b) Increase in the L of proteins with logarithmic increase 
in size (A''). Random controls are indicated by an arrow in both the figures. 



where is the shortest path length between nodes i 
and j. 

The Diameter [D) of the network is the largest of all 
the shortest path lengths. 

(iii) The Clustering Coefficient (C;) for a node i is 
defined as the fraction of links that exist among its near- 
est neighbours to the maximum number of possible links 
among them. The Average Clustering Coefficient (C) of 
the network is defined as 

A network is a "small- world network" if it has high C 
and if its L scales logarithmically with N ^ net- 

work lacking a characteristic degree and having degree 
distribution of a power-law form is known as "scale-free 
network" jlSi. 

III. RESULTS 

A. Network parameters for different structural 
classes of proteins 

We calculate L and C for each protein. As controls, 
we calculate the L and C of random graphs and one- 
dimensional (l-d) regular graphs of the same N and K. 
Figure^^a) shows the L-C plot for the proteins and their 
random controls (indicated by an arrow). The averages 
of the distribution of L and C for the protein networks 
are 6.88 ± 2.61 and 0.553 ± 0.027, respectively. The 
Lrandom and Crandom are 2.791±0.348 and 0.031±0.022, 
and that for the comparable l-d regular graphs are 

Lregular = 29.0 ± 25.97 and Creaular = 0.643 ± 0.004. 

The Kolmogorov-Smirnov test (IJ] shows that the dif- 
ferences between L and C of the proteins and random 
or regular lattices (not shown in Fig. ^a)) are statisti- 
cally significant. Thus, these protein networks have sig- 



nificantly high clustering coefficient than their random 
counterparts and the L and C-values fall between the 
random and regular networks in the L-C plot. 

Fig. nib) shows L of all proteins with different N and 
their random counterparts (indicated by an arrow). It 
can be seen that L increases with log N, regardless of the 
structural classification of the proteins and the slope is 
higher than the random controls. This property, along 
with high C, indicate that protein networks are "small- 
world networks" [l^. 

B. Degree Distribution 

The distribution of the degrees is an important prop- 
erty which characterises the network topology. The de- 
gree distribution of a random network is characterised 
by a Poisson distribution. Figure |21 shows the degree 
distributions of a, P, a + (3, and a/ (3 protein networks. 
The shape of these distributions are bell-shaped, Poisson- 
like 0, and the number of nodes with very high degree 
falls off rapidly. This is understandable as there is a 
physical limit on the number of amino acids that can 
occupy the space within a certain distance around an- 
other amino acid. Such system-specific restrictions have 
been identified to be responsible for the emergence of 
different classes of networks with characteristic degree- 
distributions by Amaral et al. They observed that 
preferential attachment to vertices in many real scale- 
free networks |l5j | can be hindered by factors like ageing 
of the vertices (e.g. actors networks), cost of adding links 
to the vertices, or, the limited capacity of a vertex (e.g. 
airports network). 

C. Fibrous proteins 

Most proteins are "globular proteins" in their three- 
dimensional structure, where the polypeptide chain folds 
into a compact shape. In contrast, "fibrous proteins" 
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FIG. 2: Degree distributions for (a) a, (b) /3, (c) a + (3, and (d) a//3 proteins, 20 of each class. 



have relatively simple, elongated three-dimensional 
structure suitable for their biological function (see 
Fig. Elb)). The "small- world" nature of globular pro- 
teins was argued Q to be required for enhancing the ease 
of dissipation of disturbances. We studied fibrous pro- 
teins and compared their network properties with glob- 
ular proteins of comparable size. As shown in the L-C 
plot in Fig.O^a), fibrous proteins have larger L, although 
the C are similar to those of globular proteins. Thus, in 
this respect, the fibrous proteins also show "small-world" 
properties. The average diameter for the fibrous proteins 
{D = 15) was found to be larger than that of the globu- 
lar proteins {D = 8.57). This is expected because of the 
elongated structure of fibrous proteins. Despite this ma- 
jor difference in structure, the network properties are not 
much different between the fibrous proteins and globular 
proteins. This indicates that the "small- world" prop- 
erty of proteins is ubiquitous and persists irrespective of 
structural differences. 



D. a and /3 proteins 



statistically significant. Owing to the helical structure of 
the a proteins, the amino acids are densely packed com- 
pared to that of the flat f3 sheets. This may contribute to 
the small increase in the Average Clustering Coefficient 
of the a proteins. Since a + f3 and a/P have a mixed 
composition of a helices and (3 sheets, they do not show 
any clear distinction. 



E. Change in C with iV 

Clustering coefficient characterises local organisation. 
For both random as well as "scale-free" networks, the 
C is expected to fall with increasing size It has 

been shown Q that, regardless of the size, the C remains 
almost same in the core of the protein. We show the 
change in C with increasing protein size (TV) in Fig.^Jb). 
The figure shows that C does not change significantly 
with increasing size of proteins. Similar result (Sj is 
shown for the metabolic networks of 43 distinct organ- 
isms. This property is suggestive of potential modularity 
in the topology of the protein networks. 



As seen earlier, both a and /3 proteins show small- 
world properties. On finer analysis, we find that there is 
a marginal, yet consistent difference in the C of a and 
f3 proteins as shown in Fig. 2{a). The mean of C for a 
and /? proteins studied are 0.588 and 0.538, respectively. 
According to Kolmogorov-Smirnov test, this difference is 



IV. CONCLUSIONS 

Our results show that protein networks have "small- 
world" property regardless of their structural classifica- 
tion (a, /3, a + l3, and a//3) and tertiary structures (glob- 




FIG. 3: (a) L-C plot of Fibrous and Globular proteins, (b) Examples of three-dimensional structures of a fibrous and globular 
protein (not to the scale) with their PDB codes. 
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FIG. 4: (a) L-C plot for a and (3 proteins. Arrows indicate the means of C for a and (3 proteins. (b) Change in C of proteins 
with increasing TV. 



ular and fibrous proteins) , even though sniaU but definite 
differences exist between a and (3 classes, and fibrous and 
globular proteins. The size independence of the Average 
Clustering Coefficient in proteins indicates toward an in- 
herent modular organisation in the protein network. 

In the cell, starting from a linear chain of amino acids, 
the protein folds in different secondary motifs such as, the 
a helices and [3 sheets and their mixtures. These then as- 
sume three-dimensional tertiary structures with helices, 
sheets and random coils, folding to give the final shape 
that is useful to carry on the biochemical function. This 
structure evolves in such a way as to confer stability and 
also allow transmission of biochemical activity (binding 
of ligand, allostery, etc.) for efficient functioning. Thus 
the networks built from such proteins are expected to 
show high clustering and also reflect its modular or hier- 



archically folded organisation. Unlike other hierarchical 
networks that are modelled to form by replicating a 
core set of nodes and links, this network primarily grows 
linearly first, and then this polypeptide chain organises 
itself in a modular manner at different levels (secondary 
and tertiary). Evolution of such type of network archi- 
tectures demands further study. 
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